Here we will map an example dataset from Roy et al (Cell Rep 2019) comprising 12,245 CD34+ cells from fetal and adult bone marrow. Note that any human hematopoietic cells, regardless of tissue source or disease status, can be mapped to this reference. This entire tutorial should take less than 10 minutes to run.
library(Seurat)
library(tidyverse)
── Attaching packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.5.1 ✔ purrr 1.0.1
✔ tibble 3.2.1 ✔ dplyr 1.1.4
✔ tidyr 1.3.0 ✔ stringr 1.5.0
✔ readr 1.4.0 ✔ forcats 0.5.1
Warning: package ‘tibble’ was built under R version 4.1.2Warning: package ‘tidyr’ was built under R version 4.1.2Warning: package ‘purrr’ was built under R version 4.1.2Warning: package ‘stringr’ was built under R version 4.1.2── Conflicts ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
library(symphony)
Warning: package ‘symphony’ was built under R version 4.1.2
library(ggpubr)
Warning: package ‘ggpubr’ was built under R version 4.1.2
library(patchwork)
Install package from github
## install dependencies that are not on CRAN
if(!require(BiocManager, quietly = TRUE)) install.packages("BiocManager")
BiocManager::install(c("AUCell", "doMC", "BiocNeighbors"))
if(!require(devtools, quietly = TRUE)) install.packages("devtools")
devtools::install_github("jaredhuling/jcolors")
## install BoneMarrowMap package
devtools::install_github('andygxzeng/BoneMarrowMap', force = TRUE)
library(BoneMarrowMap)
Set projection folder, Download reference object and UMAP model
# Set directory to store projection reference files
projection_path = './'
# Download Bone Marrow Reference - 344 Mb
curl::curl_download('https://bonemarrowmap.s3.us-east-2.amazonaws.com/BoneMarrowMap_SymphonyReference.rds',
destfile = paste0(projection_path, 'BoneMarrowMap_SymphonyReference.rds'))
# Download uwot model file - 221 Mb
curl::curl_download('https://bonemarrowmap.s3.us-east-2.amazonaws.com/BoneMarrow_RefMap_uwot_model.uwot',
destfile = paste0(projection_path, 'BoneMarrowMap_uwot_model.uwot'))
projection_path = './'
# Load Symphony reference
ref <- readRDS(paste0(projection_path, 'BoneMarrowMap_SymphonyReference.rds'))
# Set uwot path for UMAP projection
ref$save_uwot_path <- paste0(projection_path, 'BoneMarrowMap_uwot_model.uwot')
If we want to visualize celltype labels or metadata from the BM Reference, we can create a Seurat Object from the symphony reference This will be memory efficient as it will not include gene expression counts, only the UMAP coordinates and the metadata including cell labels and sorting information
ReferenceSeuratObj <- create_ReferenceObject(ref)
DimPlot(ReferenceSeuratObj, reduction = 'umap', group.by = 'CellType_Annotation_formatted',
raster=FALSE, label=TRUE, label.size = 4) + NoAxes()
We can also visualize broader cell type labels which may simplify interpretation and downstream analysis.
DimPlot(ReferenceSeuratObj, reduction = 'umap', group.by = 'CellType_Broad',
raster=FALSE, label=TRUE, label.size = 4)
We can visualize other annotations too, including cell cycle phase and lineage pseudotime estimates.
p1 <- DimPlot(ReferenceSeuratObj, reduction = 'umap', group.by = 'CyclePhase', raster=FALSE) + NoAxes()
p2 <- FeaturePlot(ReferenceSeuratObj, reduction = 'umap', features = 'Pseudotime', raster=FALSE) + NoAxes()
p1 + p2
Example Query object is a subset of data from Roy et al (Cell Reports 2021), incorporating CD34p cells from an adult bone marrow sample and a fetal bone marrow sample. Here, we prefer raw count data without normalization.
Input does not need to be a seurat object, can load raw count matrix and metadata separately
# Load example data from Roy et al - 141 Mb
curl::curl_download('https://bonemarrowmap.s3.us-east-2.amazonaws.com/ExampleQuery_Roy2021.rds',
destfile = paste0(projection_path, 'ExampleQuery_Roy2021.rds'))
# Load seurat object
query <- readRDS(paste0(projection_path, 'ExampleQuery_Roy2021.rds'))
query
An object of class Seurat
33538 features across 12245 samples within 1 assay
Active assay: RNA (33538 features, 0 variable features)
Provide raw counts, metadata, and donor key. This should take <1 min. Calculate mapping error and perform QC to remove low quality cells with high mapping error
# batch variable to correct in the query data, set as NULL if no batches in query
batchvar <- 'sampleID'
# Map query dataset using Symphony (Kang et al 2021)
query <- map_Query(
exp_query = query[['RNA']]@counts,
metadata_query = query@meta.data,
ref_obj = ref,
vars = batchvar
)
Normalizing
Scaling and synchronizing query gene expression
Found 2386 reference variable genes in query dataset
Project query cells using reference gene loadings
Clustering query cells to reference centroids
Correcting query batch effects
UMAP
All done!
Now that the data is mapped, we will evaluate the mapping QC metrics and flag cells with high mapping errors
# Run QC based on mapping error score, flag cells with mapping error >= 2.5 MADs above median
query <- query %>% calculate_MappingError(., reference = ref, MAD_threshold = 2.5)
# Get QC Plots
QC_plots <- plot_MappingErrorQC(query)
# Plot together - If this is too crowded, can also just call "QC_plots" aloneto display one by one
patchwork::wrap_plots(QC_plots, ncol = 4, widths = c(0.8, 0.3, 0.8, 0.3))
This important step identifies a subset of cells with high mapping error from the query dataset that are either:
Sometimes, low quality cells may erroneously map to the orthochromatic erythroblast region as this cell type has very low transcriptional diversity. These low quality query cells do not have hemoglobin expression and are in fact mis-mapped; they will be flagged by the QC filter and excluded from cell type assignments.
Please adjust the MAD_threshold (typically between 1 and 3) based on the distribution of your dataset to identify the outliers with low quality and high mapping error scores. This will improve your classifications and any downstream composition analysis
# # Optional step - remove outliers with high mapping error
# query <- subset(query, mapping_error_QC == 'Pass')
Optionally, outlier cells with high mapping error can also be removed at this stage. For ease of integrating these mapped annotations with the rest of your analysis, we also choose to skip this step. If so, Final CellType and Pseudotime predictions will be assigned as NA for cells failing the mapping error QC threshold.
We will next use a KNN classifier to assign cell identity based on the 30 K-Nearest Neighbours from the reference map. Broader cell type labels will also be transferred automatically along with the precise cell type labels. This label transfer step should take <1 min for 10,000 cells and <5 min for 100,000 cells.
# Predict Hematopoietic Cell Types by KNN classification
query <- predict_CellTypes(
query_obj = query,
ref_obj = ref,
initial_label = 'initial_CellType', # celltype assignments before filtering on mapping QC
final_label = 'predicted_CellType' # celltype assignments with map QC failing cells assigned as NA
)
DimPlot(subset(query, mapping_error_QC == 'Pass'), reduction = 'umap_projected', group.by = c('predicted_CellType'),
raster=FALSE, label=TRUE, label.size = 4)
We can also visualize the broader cell type categories in case precise labels are too granular. These provide a simpler view of the data and can help guide cluster annotations if your dataset does not have many cells.
DimPlot(subset(query, mapping_error_QC == 'Pass'), reduction = 'umap_projected', group.by = c('predicted_CellType_Broad'),
raster=FALSE, label=TRUE, label.size = 4)
We can also annotate each query cell based on their position along hematopoietic pseudotime. Query cells will be assigned a pseudotime score based on the 30 K-Nearest Neighbours from the reference map. Since our Pseudotime KNN assignments are performed in UMAP space (more accurate than KNN on harmony components), this step is very fast (< 10s)
# Predict Pseudotime values by KNN
query <- predict_Pseudotime(
query_obj = query,
ref_obj = ref,
initial_label = 'initial_Pseudotime', # pseudotime assignments before filtering on mapping QC
final_label = 'predicted_Pseudotime' # pseudotime assignments with map QC failing cells assigned as NA
)
# Visualize Hematopoietic Pseudotime in query data
FeaturePlot(subset(query, mapping_error_QC == 'Pass'), features = c('predicted_Pseudotime'))
This will save a csv file with the mapped annotations for each cell (mapping error scores, umap coordinates, predicted Cell Type, and predicted Pseudotime)
# Save CellType Annotations and Projected UMAP coordinates
save_ProjectionResults(
query_obj = query,
celltype_label = 'predicted_CellType',
celltype_KNNprob_label = 'predicted_CellType_prob',
pseudotime_label = 'predicted_Pseudotime',
file_name = 'querydata_projected_labeled.csv')
Now let’s visualize the density distribution of query cells across the hematopoietic hierarchy
# Set batch/condition to be visualized individually
batch_key <- 'sampleID'
# returns a list of plots for each donor from a pre-specified batch variable
projection_plots <- plot_Projection_byDonor(
query_obj = query,
batch_key = batch_key,
ref_obj = ref,
Hierarchy_only = FALSE, # Whether to exclude T/NK/Plasma/Stromal cells
downsample_reference = TRUE,
downsample_frac = 0.25, # down-sample reference cells to 25%; reduces figure file size
query_point_size = 0.2, # adjust size of query cells based on # of cells
saveplot = TRUE,
save_folder = 'projectionFigures/'
)
# show plots together with patchwork. Can also just call "projection_plots" object to display one-by-one
patchwork::wrap_plots(projection_plots, ncol = 2)
We can also set Hierarchy_only = TRUE to remove T/NK/Plasma/Stromal cells and focus solely on the hematopoietic hierarchy.
# Set batch/condition to be visualized individually
batch_key <- 'sampleID'
# returns a list of plots for each donor from a pre-specified batch variable
projection_plots <- plot_Projection_byDonor(
query_obj = query,
batch_key = batch_key,
ref_obj = ref,
Hierarchy_only = TRUE, # Whether to exclude T/NK/Plasma/Stromal cells
downsample_reference = TRUE,
downsample_frac = 0.25, # down-sample reference cells to 25%; reduces figure file size
query_point_size = 0.2, # adjust size of query cells based on # of cells
saveplot = TRUE,
save_folder = 'projectionFigures/'
)
# show plots together with patchwork. Can also just call "projection_plots" object to display one-by-one
patchwork::wrap_plots(projection_plots, ncol = 2)
Note how cell composition differs between CD34p sorted Adult Bone Marrow and Fetal Bone Marrow within the Roy 2021 dataset. Notably, Adult bone marrow is enriched for more early HSC/MPP and LMPP cells and also MEP & Early Erythroid lineages. In contrast, Fetal bone marrow is highly enriched for MLP and B cell progenitors.
We can also compare the mapping of these samples along Hematopoietic pseudotime:
Here, to study the abundance of each cell type within each donor, I focus on cells that were classified with a KNN prob > 0.5 (that is, >50% of nearest neighbours from the reference map agree on the assigned cell type).
We can present this as a long table
query_composition <- get_Composition(
query_obj = query,
donor_key = 'sampleID',
celltype_label = 'predicted_CellType',
mapQC_col = 'mapping_error_QC',
knn_prob_cutoff = NULL,
return_type = 'long')
query_composition
Or as a wide table with counts of # of cells
query_composition <- get_Composition(
query_obj = query,
donor_key = 'sampleID',
celltype_label = 'predicted_CellType',
mapQC_col = 'mapping_error_QC',
knn_prob_cutoff = NULL,
return_type = 'count')
query_composition
Or as a wide table with proportion of each cell type within each donor
query_composition <- get_Composition(
query_obj = query,
donor_key = 'sampleID',
celltype_label = 'predicted_CellType',
mapQC_col = 'mapping_error_QC',
knn_prob_cutoff = NULL,
return_type = 'proportion')
query_composition
# Simple heatmap to visualize composition of projected samples
p <- query_composition %>% column_to_rownames('sampleID') %>% data.matrix() %>% ComplexHeatmap::Heatmap()
p
Now you can perform standard dimensionality reduction and clustering on your sample in an unsupervised manner and use the projected labels to guide your analysis and cell type annotations. Here we will start with a basic dimensionality reduction to visualize how the CD34+ cells are organized within these samples.
query <- query %>%
SCTransform() %>%
RunPCA() %>%
harmony::RunHarmony(group.by.vars = 'sampleID') %>%
RunUMAP(reduction = 'harmony', dims = 1:20)
Calculating cell attributes from input UMI matrix: log_umi
Variance stabilizing transformation of count matrix of size 19330 by 12245
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 2000 genes, 5000 cells
|
| | 0%
|
|===================================================== | 25%
|
|========================================================================================================== | 50%
|
|=============================================================================================================================================================== | 75%
|
|====================================================================================================================================================================================================================| 100%
There are 3 estimated thetas smaller than 1e-07 - will be set to 1e-07
Found 91 outliers - those will be ignored in fitting/regularization step
Second step: Get residuals using fitted parameters for 19330 genes
|
| | 0%
|
|===== | 3%
|
|=========== | 5%
|
|================ | 8%
|
|====================== | 10%
|
|=========================== | 13%
|
|================================= | 15%
|
|====================================== | 18%
|
|=========================================== | 21%
|
|================================================= | 23%
|
|====================================================== | 26%
|
|============================================================ | 28%
|
|================================================================= | 31%
|
|======================================================================= | 33%
|
|============================================================================ | 36%
|
|================================================================================== | 38%
|
|======================================================================================= | 41%
|
|============================================================================================ | 44%
|
|================================================================================================== | 46%
|
|======================================================================================================= | 49%
|
|============================================================================================================= | 51%
|
|================================================================================================================== | 54%
|
|======================================================================================================================== | 56%
|
|============================================================================================================================= | 59%
|
|================================================================================================================================== | 62%
|
|======================================================================================================================================== | 64%
|
|============================================================================================================================================= | 67%
|
|=================================================================================================================================================== | 69%
|
|======================================================================================================================================================== | 72%
|
|============================================================================================================================================================== | 74%
|
|=================================================================================================================================================================== | 77%
|
|========================================================================================================================================================================= | 79%
|
|============================================================================================================================================================================== | 82%
|
|=================================================================================================================================================================================== | 85%
|
|========================================================================================================================================================================================= | 87%
|
|============================================================================================================================================================================================== | 90%
|
|==================================================================================================================================================================================================== | 92%
|
|========================================================================================================================================================================================================= | 95%
|
|=============================================================================================================================================================================================================== | 97%
|
|====================================================================================================================================================================================================================| 100%
Computing corrected count matrix for 19330 genes
|
| | 0%
|
|===== | 3%
|
|=========== | 5%
|
|================ | 8%
|
|====================== | 10%
|
|=========================== | 13%
|
|================================= | 15%
|
|====================================== | 18%
|
|=========================================== | 21%
|
|================================================= | 23%
|
|====================================================== | 26%
|
|============================================================ | 28%
|
|================================================================= | 31%
|
|======================================================================= | 33%
|
|============================================================================ | 36%
|
|================================================================================== | 38%
|
|======================================================================================= | 41%
|
|============================================================================================ | 44%
|
|================================================================================================== | 46%
|
|======================================================================================================= | 49%
|
|============================================================================================================= | 51%
|
|================================================================================================================== | 54%
|
|======================================================================================================================== | 56%
|
|============================================================================================================================= | 59%
|
|================================================================================================================================== | 62%
|
|======================================================================================================================================== | 64%
|
|============================================================================================================================================= | 67%
|
|=================================================================================================================================================== | 69%
|
|======================================================================================================================================================== | 72%
|
|============================================================================================================================================================== | 74%
|
|=================================================================================================================================================================== | 77%
|
|========================================================================================================================================================================= | 79%
|
|============================================================================================================================================================================== | 82%
|
|=================================================================================================================================================================================== | 85%
|
|========================================================================================================================================================================================= | 87%
|
|============================================================================================================================================================================================== | 90%
|
|==================================================================================================================================================================================================== | 92%
|
|========================================================================================================================================================================================================= | 95%
|
|=============================================================================================================================================================================================================== | 97%
|
|====================================================================================================================================================================================================================| 100%
Calculating gene attributes
Wall clock passed: Time difference of 1.307162 mins
Determine variable features
Place corrected count matrix in counts slot
Centering data matrix
|
| | 0%
|
|===================================================== | 25%
|
|========================================================================================================== | 50%
|
|=============================================================================================================================================================== | 75%
|
|====================================================================================================================================================================================================================| 100%
Set default assay to SCT
PC_ 1
Positive: MPO, AREG, AZU1, PRTN3, ELANE, LYZ, SRGN, H1FX, CTSG, NEAT1
RPLP1, RPS2, RPL13, CFD, AVP, RPS12, RPS24, EREG, JUND, SERPINB1
RPS18, MT-ATP6, FTH1, TPT1, SOCS3, RPS14, RPS8, CEBPD, MT-CO3, RPL12
Negative: VPREB3, LTB, CD79B, VPREB1, CD24, DNTT, IGLL1, AKAP12, TMSB4X, HIST1H4C
HMGB2, DUSP1, IGHM, STMN1, TUBB, TUBA1B, UBE2C, CD9, HSPA8, CD79A
CTGF, RRM2, JUN, PTMA, CMTM8, UHRF1, DYNLL1, HMGN2, NUSAP1, CCND3
PC_ 2
Positive: MPO, PRTN3, AZU1, LYZ, CXCL8, SRGN, MS4A3, CTSG, CCL3, ANXA1
ELANE, CFD, CALR, CLEC11A, C1QTNF4, LGALS1, RNASE2, HSPA5, CSTA, RGCC
DLK1, TMSB4X, HSP90B1, CCL4, PRSS57, CEBPD, KLF6, CYBA, SDF2L1, PLEK
Negative: HBB, AVP, H1FX, CA1, HBD, BLVRB, APOC1, PRDX2, FAM178B, S100A6
AHSP, JUND, APOE, RPS2, TPT1, SOCS3, TXNIP, KCNH2, MT-ATP6, GNAS
HBA1, ZFP36L2, ALDH1A1, HIST1H1D, MT-CYB, FAM30A, PRKAR2B, FTH1, PDZD8, EEF1A1
PC_ 3
Positive: HBB, CA1, BLVRB, APOC1, FAM178B, HBD, S100A6, AHSP, PRDX2, APOE
KCNH2, CXCL8, HBA1, EPCAM, ATP5IF1, S100A4, NMU, MS4A3, HSPA5, CNRIP1
CCL3, PLIN2, TMEM14C, HIST1H4C, CD36, PRKAR2B, ANXA1, CA2, TFR2, RHAG
Negative: AREG, AVP, SOCS3, ZFP36L2, IGHM, CD74, NEAT1, JUND, DNTT, HLA-DRA
HLA-DRB1, TPT1, DDIT4, HLA-DPB1, CD79B, HLA-DPA1, VIM, MAP3K8, HLA-B, CEBPB
HLA-E, BEX1, HLA-DQB1, JCHAIN, VPREB3, GRASP, VPREB1, PIK3R1, JUN, SPINK2
PC_ 4
Positive: CXCL8, CLC, CCL3, MS4A3, TUBA1A, ANXA1, RGCC, HSPA5, HSPA1A, MEG3
AVP, CCL4, HPGDS, IL5RA, CXCL3, HDC, KLF6, RGS1, AC020916.1, RHEX
EPX, ALOX5AP, NFKBIA, LMO4, CXCL2, DLK1, PTX3, CCL3L1, RPS27, IL1B
Negative: ELANE, MPO, AREG, AZU1, PRTN3, LYZ, HMGB2, CTSG, HBB, CEBPD
TUBA1B, DNTT, H1FX, CFD, CD24, VPREB3, HIST1H4C, CA1, BLVRB, CD79B
IGLL1, AHSP, CST7, S100A6, PTMA, MKI67, VPREB1, TOP2A, AC020656.1, HMGN2
PC_ 5
Positive: LGALS1, JCHAIN, CST3, IRF8, PLD4, IGKC, GPR183, SAMHD1, ALOX5AP, TGFBI
CCDC50, S100A10, VIM, UGCG, ANXA2, RAB11FIP1, C12orf75, HLA-DRB1, LYZ, S100A6
CD74, ARL4C, FCER1G, HERPUD1, IRF7, PLP2, IL3RA, S100A4, TYROBP, SPIB
Negative: PRTN3, AZU1, CD79B, VPREB3, CD24, MPO, DNTT, AKAP12, VPREB1, CD9
CTSG, IGLL1, ELANE, CMTM8, AVP, C1QTNF4, SERPINB1, RPS18, GYPC, ARPP21
CTGF, RPL13, RPL34, LCN6, RPL13A, CST7, CALR, LINC01013, DLK1, RPL7
Harmony 1/10
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Harmony 2/10
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Harmony 3/10
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Harmony 4/10
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Harmony 5/10
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Harmony 6/10
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Harmony 7/10
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Harmony 8/10
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Harmony converged after 8 iterations
Warning: Cannot add objects with duplicate keys (offending key: harmony_), setting key to 'harmonyqv_'Warning: Invalid name supplied, making object name syntactically valid. New object name is Seurat..ProjectDim.SCT.harmony; see ?make.names for more details on syntax validityWarning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session19:19:54 UMAP embedding parameters a = 0.9922 b = 1.112
19:19:54 Read 12245 rows and found 20 numeric columns
19:19:54 Using Annoy for neighbor search, n_neighbors = 30
19:19:54 Building Annoy index with metric = cosine, n_trees = 50
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
19:19:55 Writing NN index file to temp file /var/folders/2b/f9nktt6d0wzfcpxv5xxm2n900000gp/T//RtmpGCyUkO/file11ec1e117a32
19:19:55 Searching Annoy index using 1 thread, search_k = 3000
19:19:58 Annoy recall = 100%
19:19:58 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
19:20:00 Initializing from normalized Laplacian + noise (using irlba)
19:20:00 Commencing optimization for 200 epochs, with 515168 positive edges
Using method 'umap'
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
19:20:07 Optimization finished
DimPlot(query, reduction = 'umap', group.by = c('predicted_CellType'), label=TRUE, label.size = 4)
p1 <- DimPlot(query, reduction = 'umap', group.by = c('sampleID'), label=TRUE, label.size = 4)
p2 <- FeaturePlot(subset(query, mapping_error_QC == 'Pass'), reduction = 'umap', features = c('predicted_Pseudotime'), label.size = 4, max.cutoff = 'q95')
p1 | p2
We can identify a clear cluster of HSCs and observe maturation along the lymphoid, myeloid, and Mk/Ery lineages based on our projected Pseudotime estimates. To simplify downstream analysis, we can also visualize broader cell type labels below on the new umap as well as the projected UMAP.
p1 <- DimPlot(query, reduction = 'umap', group.by = c('predicted_CellType_Broad'), label=TRUE, label.size = 4)
p2 <- DimPlot(query, reduction = 'umap_projected', group.by = c('predicted_CellType_Broad'), label=TRUE, label.size = 4)
p1 | p2